Artificial intelligence assisted wearable

ABSTRACT

The description relates to artificial intelligence assisted wearables, such as backpacks. An example backpack may include sensors, such as a microphone and a camera. The backpack may receive a contextual voice command from a user. The contextual voice command may include a non-explicit reference to an object in an environment. The backpack may use the sensors to sense the environment, use an artificial intelligence engine to identify the object in the environment, and use a digital assistant to perform a contextual task in response to the contextual voice command. The contextual task may relate to the object in the environment. The backpack may output a response to the contextual voice command to the user.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/570,058, filed on Sep 13, 2019, the disclosure of which is herebyincorporated by reference in its entirety.

BACKGROUND

Digital assistants are becoming more versatile due to advancements incomputing. The present concepts relate to improvements in wearabledigital assistants that can perform various tasks for the benefit ofusers.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate implementations of the presentconcepts. Features of the illustrated implementations can be morereadily understood by reference to the following descriptions inconjunction with the accompanying drawings. Like reference numbers inthe various drawings are used where feasible to indicate like elements.The accompanying drawings are not necessarily drawn to scale. In thefigures, the left-most digit of a reference number identifies the figurein which the reference number first appears. The use of similarreference numbers in different instances in the description and thefigures may indicate similar or identical items.

FIG. 1 illustrates an example wearable, consistent with the presentconcepts.

FIG. 2 illustrates example system configurations, consistent with thepresent concepts.

FIGS. 3A, 3B, and 4 illustrate example scenarios, consistent with thepresent concepts.

FIG. 5 illustrates an example image of an environment, consistent withthe present concepts.

FIGS. 6 and 7 show flowcharts illustrating example methods, consistentwith the present concepts.

DETAILED DESCRIPTION

The present concepts relate to an artificial intelligence assisted smartwearable that can function as a hands-free digital assistant. Moreover,the wearable may be context-aware such that a user can providecontextual commands that relate to the environment in which the user issituated and the wearable can understand the contextual command bysensing the environment and using artificial intelligence.

Continuing advancements in artificial intelligence and the ongoingproliferation of smart devices have made digital assistants morefunctional and useful. However, conventional digital assistants haveseveral drawbacks.

First, many conventional digital assistants are physically stationary.For example, many homes are equipped with digital assistant devices thatcan understand voice commands to operate electronic devices in thehouse. For example, conventional digital assistants can control thelighting, adjust the thermostat, operate televisions, adjust speakervolume, etc. However, the functionality and usefulness of suchconventional digital assistants are limited to the home surroundings.Accordingly, such digital assistants are not available or useful whenusers are on the go or out and about.

Second, many conventional digital assistants that are available onmobile devices, such as smartphones, tablets, and laptops, require theuser to divert his attention and focus away from the task at hand,because the mobile devices require manual operation using the user'shands and require the user to look at the mobile device. For instance,user may be required to stop whatever he/she is doing; look for and takeout the device from a pocket, purse, backpack, etc.; press buttons ormove switches; tap or swipe touchscreens; look at the display andchanging graphical user interfaces (GUIs); and/or put the device backinto the user's pocket, purse, backpack, etc. These requirements make itdifficult to use conventional digital assistants in many circumstanceswhen the user is preoccupied with an ongoing task, the user's hands areoccupied, and/or when the mobile device is stowed away. For example,using conventional digital assistants can be inconvenient when the useris skiing while wearing gloves and holding ski poles; when the user isbiking and holding the bike handles; or when the user has stowed thedevice inside a pocket, purse, or backpack.

Third, conventional digital assistants are not context-aware. That is,conventional digital assistants are incapable of perceiving the user'ssurroundings and thus require the user to provide overly explicitcommands. For example, conventional digital assistants cannot see whatthe user sees and cannot hear what the user hears. Accordingly, the useris required to speak to conventional digital assistants in an unnaturaland cumbersome manner, contrary to how the user would normally speak toanother person who can perceive the user's surroundings at the sametime. Such an unnatural interaction with conventional digital assistantscan discourage users from using conventional digital assistants. Thelack of contextual information renders conventional digital assistantsvery difficult or even impossible to use in many scenarios in which theuser may wish to perform certain tasks relating to the environment.

The present concepts solve the above-discussed problems associated withconventional digital assistants. First, a digital assistant consistentwith the present concepts is available with a wearable that is worn bythe user. Second, a user can interact with the digital assistanthands-free using voice commands and one or more of auditory feedback,visual feedback, and/or haptic feedback without distracting the useraway from the current task at hand. Third, the digital assistant canperceive the user's surroundings and thus is context-aware, enabling theuser to provide contextual commands to the digital assistant.

A digital assistant consistent with the present concepts has severaladvantages and provides many benefits to the user. The digital assistantcan be with the user wherever he/she goes so long as he/she brings thewearable with him/her. The user can conveniently utilize the digitalassistant using voice commands and need not free his/her hands ordistract his/her eyes from whatever activity he/she is currently engagedin. Furthermore, the user can provide contextual commands that requiressome understanding or perception of the environment in which he/she isin, because the digital assistant is capable of sensing and interpretingthe user's surroundings. The present concepts allow the user to formcommands relating to the environment in a more natural way to cause thedigital assistant to perform contextual actions based on the environmentaround the user.

FIG. 1 illustrates a wearable 100, consistent with the present concepts.In one implementation, the wearable 100 may be a backpack 101 worn onthe back of a user 102, as illustrated in FIG. 1 . The backpack 101 mayinclude one or more straps 104 so that the backpack 101 can be worn overthe shoulders of the user 102, as illustrated in FIG. 1 . Furthermore,the backpack 101 may include handles, zippers, pockets, sleeves, orother accessories (not shown). In other implementations, the wearable100 may be headgear (a hat, a helmet, etc.), eyewear (glasses, goggles,etc.), a bag (a purse, a briefcase, a suitcase, a tote, etc.), apparel(shoes, a jacket, an overall, etc.), an accessory (a watch, a necklace,a bracelet, an anklet, etc.), or anything else that the user 102 maywear, carry, or take with him.

Consistent with the present concepts, the wearable 100 may include oneor more components. For example, the backpack 101 may include aprocessor (introduced relative to FIG. 2 ) for processing computerinstructions. The backpack 101 may include storage (introduced relativeto FIG. 2 ) that can store any kind of data, for example,computer-executable instructions and/or user data such as the user'scalendar. The backpack 101 may include a battery 106 that can powervarious components of the backpack 101. The battery 106 may bereplaceable and/or rechargeable. For instance, the battery 106 may becharged via a cable to an electrical outlet.

Consistent with the present concepts, the wearable 100 may include oneor more input components (e.g., sensors) for receiving inputs from theuser 102 and/or for sensing the environment around the user 102. Forexample, the backpack 101 may include one or more buttons 108 (e.g.,switches). The buttons 108 may be used to control any components of thebackpack 101. For instance, the buttons 108 may control the battery 106to power on, power off, sleep, hibernate and/or wake the backpack 101.The buttons 108 may be operated by pressing, long pressing, holding,double clicking, tapping, touching, squeezing, flipping, and/or rotatingthe buttons 108. The buttons 108 may also be used to pair the backpack101 with another device (explained below) or to charge another device(explained below). The buttons 108 may be used to activate or providevoice commands to the digital assistant. Voice commands may include arequest to perform a certain function and/or a query seeking certaininformation. The buttons 108 may be located on the strap 104 of thebackpack 101 (as illustrated in FIG. 1 ), on the battery 106, orelsewhere on the backpack 101.

In some implementations of the present concepts, the backpack 101 mayinclude one or more pressure sensors 109 (e.g., force-sensitiveresistors). The pressure sensors 109 may be located anywhere on thebackpack 101, such as on the straps 104 (as illustrated in FIG. 1 ), onthe handles, at the bottom of the backpack 101, in the pockets, etc. Forexample, the pressure sensor 109 may be installed inside the strap 104or at the underside of the strap 104 that contacts a shoulder of theuser 102. The pressure sensors 109 may allow the backpack 101 to sensewhen the backpack 101 is picked up by the user 102, when the user 102wears on the backpack 101, when the user places the backpack 101 down,and/or when a device is placed in the backpack 101 (or inside a pocketof the backpack 101). Accordingly, the backpack 101 may be configured toautomatically turn on when it is worn by the user 102, automaticallyturn off when taken off by the user 102, and/or automatically pair witha device stowed in a pocket, based on readings from one or more pressuresensors 109.

The backpack 101 may include a camera 110 for visually sensing theenvironment surrounding the user 102. For example, the camera 110 may beattached to the strap 104 of the backpack 101 and may face the front ofthe user 102, as illustrated in FIG. 1 . In one implementation, thecamera 110 may be embedded inside the strap 104, such that the camera110 is discreetly hidden and/or less noticeable. Alternative cameraconfigurations are possible. For instance, one or more cameras 110 maybe positioned to face the rear, sides, down, and/or up above the user102. For example, cameras 110 may be positioned on each of the straps104 to capture the environment in a direction the user's body is facing.Another camera 110, such as a fish eye camera may be positioned on thestraps 104 to capture the user's face (e.g., where the user 102 islooking). Thus, the cameras 110 can collectively provide data about theorientation of user's body and the user's head.

The camera 110 may be fixed relative to the strap 104 such that thedirection in which the camera 110 points changes as the strap 104 moves.In alternative implementations, the camera 110 may be a gimbal camerathat can be freely rotated about one or more axes. For example, thedirection in which the camera 110 points may remain parallel to theground (or perpendicular to the direction of gravity) even as the strap104 moves. Accordingly, the camera 110 may have the same or similarviewpoint as the user 102 and be able to perceive the same or similarenvironment as the user 102 is perceiving. In other implementations, thecamera 110 may be motorized to rotate, tilt, and/or pan the direction inwhich the camera 110 points. For instance, the backpack 101 may includeone or more infrared sensors attached to the straps 104 and pointingupward, such that the infrared sensors can determine the direction theuser's head is facing. Accordingly, the camera 110 may be configured toturn in response to the user's head turning, such that the camera 110can perceive the same part of the surrounding that the user is currentlooking at.

The camera 110 may capture an image recording of the environment and/orcapture a video recording of the environment. The camera 110 may recordin black and white, capture visible light using an RBG color model, orcapture non-visible light spectrum such as infrared, ultraviolet, x-ray,etc. The camera 110 may be equipped with night vision capabilities. Therecordings made by the camera 110 may be stored in the storage of thebackpack 101. The recordings may be transmitted to a device (e.g., aserver) for storage and/or processing.

The camera 110 may be activated and/or deactivated manually by the user102 using the button 108, by the user 102 using voice commands, by thedigital assistant without explicit input from the user 102, and/or byany other trigger such as scheduled triggers, event triggers, orenvironmental triggers detected by one or more input components.Alternatively, the camera 110 may be configured (e.g., with the user'sconsent and/or based on the user's preferences) to be constantly on andrecording the user's environment.

Consistent with the present concepts, the backpack 101 may include amicrophone 112 for auditorily sensing the environment near the user 102.The microphone 112 may be attached to the strap 104 near the user'smouth to detect the user's speech, as illustrated in FIG. 1 .Alternatively or additionally, one or more microphones 112 (e.g., stereomicrophones) may be attached to other portions of the backpack 101. Inaddition to detecting the user's speech, the microphone 112 may alsodetect other sounds in the user's environment, such as other people'sspeech, animal sounds, music, traffic sounds, etc. The microphone 112may generate an audio recording and may store the audio recording in thestorage of the backpack 101. The audio recording may be transmitted to adevice (e.g., a server) for storage and/or processing. The microphone112 may operate in connection with the camera 110 to create a videorecording of the environment that includes image frames andcorresponding audio.

The backpack 101 may include any other input components for sensing theenvironment of the user 102. For example, the backpack 101 include aglobal positioning system (GPS) unit for sensing the geographicallocation, including elevation, of the user 102. In one implementation,the backpack 101 may automatically turn on when it leaves the user'shome according to the location information provided by the GPS unit. Thebackpack 101 may include a compass for sensing the cardinal directionthat the backpack 101 (and hence, the user 102) is facing and/or movingtowards. The backpack 101 may include an accelerometer for sensing theacceleration, movement, and/or orientation of the backpack 101 (and theuser 102). In one implementation, the backpack 101 may automaticallysleep or turn off when the accelerometer senses that the user 102 hastaken off the backpack 101 and has placed it on the floor, for instance.The backpack 101 may include a thermometer for sensing the ambienttemperature of the user's surroundings and/or for sensing the bodytemperature of the user 102. The backpack 101 may include a barometerfor sensing the ambient pressure and/or elevation of the user 102. Thebackpack 101 may include one or more biometric sensors for sensing theuser's heart rate, blood pressure, body temperature, blood sugar level,etc. The backpack 101 may include a radio frequency identification(RFID) sensor for detecting RFID-tagged objects. In one implementation,the backpack 101 may be aware of the contents in the backpack 101 thatare RFID-tagged. The backpack 101 may be equipped with any number andany type of sensors, such that the backpack can perceive the user 102and his environment, and understand contextual commands from the user102.

The user's privacy can be protected in many ways. For example, thesensors may be enabled to sense the user's environment only if the user102 provides express consent. Furthermore, recordings of the user'senvironment may be created only temporarily (i.e., stored just longenough to perform the command provided by the user 102 and the deleted)rather than stored persistently. Additional privacy and securityprocedures can be implemented to safeguard the user's privacy. Forinstance, the backpack 101 may encrypt any user-specific data and/orinclude a security lock (e.g., password, fingerprint, voice recognition,etc.).

Consistent with the present concepts, the wearable 100 may include oneor more output components for providing outputs to the user 102, such asresponses and/or feedback based on the user's inputs. For example, thebackpack 101 may include a speaker 114. The speaker 114 may produceauditory outputs to the user 102, such as the digital assistant's voice,beeps, rings, acoustic tones and alerts, music, error messages, etc. Thespeaker 114 may be located on the strap 104 of the backpack 101 to beproximate to the user's ears, as illustrated in FIG. 1 , or the speaker114 may be located elsewhere on the backpack 101. The backpack 101 mayinclude multiple speakers 114, for example, to provide stereo sounds tothe user 102, to provide ambient sounds into the environment, and/or todirect auditory outputs to another person in front of the user 102.Alternatively or additionally, the backpack 101 may include an audiojack for plugging in a speaker, a headset, or earphones. The plug-inspeaker may be powered by its own battery or be powered by the battery106 of the backpack. Alternatively or additionally, the backpack 101 maybe capable of wirelessly connecting to a wireless speaker, such as awireless headset, wireless earphones, and/or wireless earbuds. Thewireless speaker may connect to the backpack 101 via one or morewireless communication protocols, such as Bluetooth, Wi-Fi, or infrared.

The backpack 101 may include a display for providing visual outputs tothe user 102. For example, the backpack 101 may include a light emittingdiode (LED) display 116 that acts as an indicator light. The LED display116 may be attached to the strap 104, as illustrated in FIG. 1 . The LEDdisplay 116 may illuminate different colors, different brightnessintensities, different number of LED bulbs in an array, and/or displaytext and/or images. In various example implementations, the LED display116 may indicate to the user 102 that the backpack 101 is on, thebattery 106 is charging, the battery 106 is low, the battery 106 has acertain percentage of charge remaining, the digital assistant islistening, the digital assistant is talking, the camera 110 is active,the microphone 112 is active, the backpack 101 is paired with anotherdevice, an error has occurred, or any other status relating to thebackpack 101. For instance, LED display 116, when activated, may signalto other people that the user 102 is talking to the digital assistant.Other types of displays, such as TVs, monitors, display screens,touchscreens, flexible displays, such as organic light emitting diodes(OLEDs), halogen lights, and/or fluorescent lights, may be included inthe backpack 101 for displaying lights, text, and/or images to the user102, and/or for providing visual feedback to the user 102. For example,a display screen may act as a dashboard that shows the user 102 variousinformation about the status of the backpack 101. The display screen mayalso show the user 102 what the camera 110 is seeing. Moreover, the LEDdisplay 116 may be turned on at high intensity to act as a flashlightwhen the user 102 is in a dark environment. The LED display 116 mayactivate in connection with the camera 110 to act a camera flash toilluminate the environment that the camera 110 is recording. Thebackpack 101 may also include an infrared light emitter to enable to thecamera 110 to record in the dark using night vision technology.

The backpack 101 may include a haptic component 118 (e.g., a hapticactuator) for providing haptic (e.g., tactile and/or vibrational)outputs to the user 102. In one implementation, the haptic component 118may be located on one or more straps 104 of the backpack 101 to providehaptic sensations to the user's shoulder(s). Additionally oralternatively, the backpack 101 may include haptic components 118 nearthe front of the backpack 101 that rests against the user's back, nearthe bottom of the backpack 101 that rests against the user's waist, orany other location on the backpack 101. The haptic component 118 may beused to provide any type of response, feedback, indication, and/ordirection to the user 102. For example, the haptic components 118 canproduce various haptic sensations (e.g., short tap, long vibration,various pulses, etc.) to signal to the user 102 that the backpack 101has turned on, the battery 106 is low, the digital assistant islistening, an error has occurred, the backpack 101 has paired withanother device, the camera 110 has been activated, or the user 102should turn left or right depending on whether the haptic sensation isproduced on the left side or right side of the user's body (i.e.,shoulders, back, and/or waist). Haptic feedback generated by the hapticcomponent 118 may be more helpful than auditory feedback generated bythe speaker 114 where the user 102 is in an environment with a lot ofbackground noise such that auditory feedback may be difficult to hear,or in a very quiet environment where discreet non-auditory feedback isdesirable.

The present concepts may utilize an artificial intelligence engine(introduced below relative to FIG. 2 ), for example, to interpret voicecommands from the user 102, sense the environment surrounding the user102, perform tasks in response to the voice commands, and/or generateoutputs to the user 102. In one implementation, the artificialintelligence engine may include one or more modules (explained below)for performing a certain related set of computerized tasks. Thesemodules may be implemented using software and/or hardware. The modulesin the artificial intelligence engine may use neural networks and/orother machine learning techniques. Consistent with the present concepts,the artificial intelligence engine can augment the backpack 101 into asmart ambient device that is aware of the environment (e.g., can knowwhere the user 102 is, can see what the user 102 sees, and can hear whatthe user 102 hears), enabling the backpack 101 to perform contextualtasks based on the environmental context associated with voice commands.

Consistent with the present concepts, the wearable 100 may be capable ofconnecting with a companion device 120 for exchanging data and/orsharing resources. The companion device 120 may be any electronicdevice, such as a personal computer, a laptop, a tablet, a smartphone, apersonal digital assistant (PDA), a camera, a virtual reality headset,an Internet of things (IoT) device, a vehicle (e.g., a car, amotorcycle, a bicycle, a scooter, etc.), an appliance, a television, awearable such as a watch or glasses, etc.

The companion device 120 may contain data, such as the user's accountinformation, the user's calendar, the user's address book, the user'stask list, the user's shopping list, encyclopedias, databases of songs,databases of images, databases of maps, databases of businessdirectories, current and historical weather data, etc. The companiondevice 120 may contain resources, such as a processor, a storage, anetwork interface, applications, a battery, a button, a camera, amicrophone, a GPS, a compass, an accelerometer, a thermometer, abarometer, a biometric sensor, a speaker, a display, a haptic actuator,etc. The companion device 120 may include computer programs (e.g., theartificial intelligence engine or the modules thereof). These and otheraspects are described in more detail below relative to FIG. 2 .

FIG. 2 illustrates example system configurations of an artificialintelligence assisted wearable system 200, consistent with the presentconcepts. For purposes of explanation, the artificial intelligenceassisted wearable system 200 shown in FIG. 2 may include the wearable100 (e.g., the backpack 101), the companion device 120 (e.g., a laptop),and/or one or more servers 206. The companion device 120 and/or theserver 206 may be any type of electronic device capable of performingtheir respective functions described above, consistent with the presentconcepts. The companion device 120 and the server 206 may even beincluded in the same device. The server 206 may include a servercomputer, a personal computer, a laptop, a video game console, a tablet,a smartphone, or any computer device having storage resources andprocessing resources. The number of devices and the client-versus-servertype of the devices described and depicted are intended to beillustrative and non-limiting.

The wearable 100, the companion device 120, and/or the server 206 may beconnected (via wire or wirelessly) to one or more of a first network208, a second network 210, and/or a third network 212 (directly orindirectly) to communicate with one another and/or access the Internet.Although the first network 208, the second network 210, and the thirdnetwork 212 have been illustrated as separate networks in FIG. 2 , anycombination of the three networks may be one and the same.

FIG. 2 also shows a first configuration 214 and a second configuration216 that can be employed by any or all of the wearable 100, thecompanion device 120, and/or the servers 206. The first configuration214 and the second configuration 216 will be explained in reference tothe server 206. The first configuration 214 may represent an operatingsystem (OS) centric configuration. The second configuration 216 mayrepresent a system on a chip (SoC) configuration. The firstconfiguration 214 can be organized into one or more applications 218, anoperating system 220, and hardware 222. The second configuration 216 maybe organized into shared resources 224, dedicated resources 226, and aninterface 228 therebetween. In either the first configuration 214 or thesecond configuration 216, the server 206 can include a storage 230 and aprocessor 232. The server 206 can also include an artificialintelligence engine 234.

In the second configuration 216, the functionality provided by theserver 206 can be integrated on a single SoC or multiple coupled SoCs.The processor 232 can be configured to coordinate with the sharedresources 224, such as the storage 230, etc., and/or the dedicatedresources 226, such as the artificial intelligence engine 234 configuredto perform certain specific functionality. Thus, the term “processor” asused herein can also refer to central processing units (CPUs), graphicalprocessing units (GPUs), controllers, microcontrollers, processor cores,or other types of processing devices.

The term “device,” “computer,” or “computing device” as used herein canmean any type of device that has some amount of processing capabilityand/or storage capability. Processing capability can be provided by oneor more hardware processors that can execute data in the form ofcomputer-readable instructions to provide a functionality. Data, such ascomputer-readable instructions and/or user-related data, can be storedon storage, such as storage that can be internal or external to thedevice. The storage can include any one or more of volatile ornon-volatile memory, hard drives, flash storage devices, optical storagedevices (e.g., CDs, DVDs etc.), and/or remote storage (e.g., cloud-basedstorage), among others. As used herein, the term “computer-readablemedia” can include transitory propagating signals. In contrast, the term“computer-readable storage media” excludes transitory propagatingsignals. Computer-readable storage media may include computer-readablestorage devices. Examples of computer-readable storage devices mayinclude volatile storage media, such as random-access memory (RAM), andnon-volatile storage media, such as hard drives, optical discs, andflash memory, among others.

Generally, any of the functions described herein can be implementedusing software, firmware, hardware (e.g., fixed-logic circuitry), or acombination of these implementations. The term “component” as usedherein generally represents software, firmware, hardware, whole devicesor networks, or a combination thereof. In the case of a softwareimplementation, for instance, these may represent program code thatperforms specified tasks when executed on a processor (e.g., CPU orCPUs). The program code can be stored in one or more computer-readablememory devices, such as computer-readable storage media. The featuresand techniques of the component are platform-independent, meaning thatthey may be implemented on a variety of commercial computing platformshaving a variety of processing configurations.

As mentioned above, the artificial intelligence engine 234 may includeone or more modules for performing a certain related set of computerizedtasks. For example, the artificial intelligence engine 234 may include avoice recognition module. When the user 102 talks into the microphone112, the voice recognition module may receive a recording of the user'sspeech and use artificial intelligence to recognize the voice of theuser 102, as distinguished from the voices of other people, foridentification and/or authentication purposes. For instance, thebackpack 101 may include a security feature that allows it to be usedonly by the user 102 or only by a certain set of authorized users. Thebackpack 101 may be trained to recognize the voice of the user 102, forexample, when the user 102 first uses or registers to use the backpack101. Moreover, the backpack 101 may behave differently (e.g., throughpreference settings and/or based on past historical use of the backpack101 by different users) depending on which user is using the backpack101.

The artificial intelligence engine 234 may include a speech recognitionmodule for interpreting the voice commands provided by the user 102. Forexample, the speech recognition module may convert the audio recordingof the voice command from the user 102 into text using artificialintelligence. The speech recognition module may be generically trainedusing speech from a large population and/or specifically trained usingspeech from the user 102.

The artificial intelligence engine 234 may include an audio recognitionmodule for recognizing and identifying various types of sounds. Forexample, the audio recognition module may be trained to recognizemusical songs; musical instruments; sounds of animals and insects;sounds of vehicles; sounds of machines and tools; or any other soundsthat could exist in the environment.

The artificial intelligence engine 234 may include a computer visionmodule. The computer vision module may be capable of receiving imagerecordings and/or video recordings of the environment surrounding theuser 102 (for example, captured by the camera 110) and recognizing thecontents of the recordings. For example, the computer vision module mayinclude an image recognition module, a text recognition module, and/or afacial recognition module. The image recognition module may be trainedto recognize and identify various objects or entities in the environmentaround the user 102, such as people, animals, plants, insects, inanimateobjects like cars, rocks, buildings, etc. The text recognition modulemay be trained to recognize and interpret text and other symbols. Thefacial recognition module may be trained to recognize and identifypeople's faces and animal faces.

The artificial intelligence engine 234 may include a cognitive modulefor interpreting and understanding the semantics, logic, reason, and/orpurpose of the voice commands provided by the user 102. The cognitivemodule may use a combination of rules and machine learning to deciphervoice commands, for example, to determine one or more tasks to beperformed in response to the voice commands.

Consistent with the present concepts, the user 102 may provide acontextual voice command that includes a contextual signal relating toany contextual information (e.g., an object) in the environmentsurrounding the user 102. For example, a contextual signal may be anynon-explicit reference (e.g., a contextual reference or a contextualcue), including pronouns (e.g., “this,” “that,” “it,” “him,” “her,”“what,” “who,” “these,” “both,” etc.), as well as other indirectreferences (e.g., “here,” “one,” “some,” etc.). The cognitive module mayrecognize that the voice command is contextual and may determine theproper context using one or more input components to sense theenvironment and then interpreting the environment using one or more ofthe modules (e.g., the computer vision module) in the artificialintelligence engine 234. These techniques will be described in moredetail below with reference to later figures.

The artificial intelligence engine 234 may include a digital assistantmodule for performing computerized tasks (e.g., consisting ofcomputerized actions) for the benefit of the user 102. For example, thedigital assistant module may perform a task that the cognitive moduledetermined should be performed in response to a voice command from theuser 102. The digital assistant module may have access to information,such as a digital calendar of the user 102, a digital address book ofthe user 102, a digital shopping list of the user 102, a digital tasklist of the user 102, the Internet, encyclopedias, search engines,databases of songs, databases of images, databases of maps, databases ofbusiness directories, current and historical weather data, etc. Thedigital assistant module may perform any computerized task so long as ithas access to read and/or write the required data for the task. Exampleof tasks that the digital assistant module can perform may includeadding events to the user's calendar, removing a merchandise item fromthe user's shopping list, reminding the user 102 about an incompletetask in his task list, answering a question from the user 102 byaccessing one or more data sources, sending a text message, reading anemail, etc.

The digital assistant module may generate outputs to the user 102. Theoutputs may be a vocal answer to a question from the user 102, anacknowledgment of a command from the user 102, a feedback that a taskwas performed, a request for information from the user 102, etc. Theoutput may be an auditory output produced by the speaker 114, a visualoutput produced by the LED display 116, and/or a haptic output producedby the haptic component 118. Other types of outputs are possible.

In one implementation of the present concepts, the wearable 100 may be astandalone device. For example, the backpack 101 may include the inputcomponents, the output components, and the artificial intelligenceengine 234 (or have access to the same). That is, the storage in thebackpack 101 may include computer programs to operate the inputcomponents and the output components as well as computer programs thatimplement or access the artificial intelligence engine 234. The backpack101, as a standalone device or operating in a standalone mode, may senseand interpret the environment surrounding the user 102, interpret voicecommands from the user 102, execute tasks in response to the voicecommands, and output responses to the user 102.

In one implementation, the wearable 100 may include a network interfaceto connect to one or more networks (e.g., the first network 208, thesecond network 210, and/or the third network 212) and to communicationwith other devices (e.g., the companion device 120, the server 206, orany other device). For example, the backpack 101 may include a wirednetwork interface (e.g., an Ethernet interface) and/or a wirelessnetwork interface (e.g., a cellular network interface, a Wi-Fi networkinterface, a Bluetooth network interface, and/or a near fieldcommunication (NFC) interface). The cellular network interface may use asubscriber identity module (SIM) card and enable the backpack 101 toaccess the Internet through a cellular data network when the user 102 isanywhere within a cellular network coverage area.

In one implementation, the backpack 101 may use the network interface toaccess data (including computer programs) that is stored remotely. Forexample, the user's calendar, address book, task list, and/or shoppinglist may be stored in a remote server (e.g., a cloud storage), such asthe server 206, that is accessible by the backpack 101 via the Internet.The backpack 101 may also access other remote data sources (e.g.,encyclopedias, search services, databases, etc.) using the networkinterface. In some implementations, the artificial intelligence engine234 (or certain modules thereof) may be located in a remote server, suchas the server 206, that is accessible by the backpack 101 via thenetwork interface. These implementations may be more feasible andappropriate where the artificial intelligence engine 234 (or certainmodule thereof) requires more processing power, storage, and/or otherresources than can fit in the wearable 100.

In some implementations consistent with the present concepts, thewearable 100 may operate in an offline mode. For example, the backpack101 may operate in offline mode without using any of its networkinterfaces and thus without connecting to a network or communicatingwith other devices. The backpack 101 in the offline mode may havelimited capabilities that do not require communicating with any otherdevices (including the servers 206). For example, where the speechrecognition module is housed in the backpack 101 but the computer visionmodule is hosted by the server 206, the backpack 101 operating in theoffline mode may be able to interpret voice commands from the user 102and execute certain tasks that do not require any computer visioncapabilities while unable to execute other tasks that require thecomputer vision module.

The backpack 101 may connect (via wire or wirelessly) with the companiondevice 120 to exchange data and/or share resources. For example, thebackpack 101 may physically connect to the companion device 120 via acable. The backpack 101 may wirelessly connect to the companion device120 using a wireless protocol, such as the Wi-Fi protocol or theBluetooth protocol. The backpack 101 may include a pocket, a pouch, acompartment, a sleeve, a strap, a docking station, a Velcro attachment,a magnetic attachment, a clip, a slot, or any other type of holder forthe companion device 120. The battery 106 in the backpack 101 may powerand/or charge the companion device 120. For example, the backpack 101may charge the user's smartphone. Additionally or alternatively, thecompanion device 120 may power the backpack 101 and/or charge thebattery 106. For example, the user's laptop may charge the battery 106.The exchange of power between the backpack 101 and the companion device120 may be via a cable (e.g., a universal serial bus (USB) cable) or viawireless charging.

Consistent with the present concepts, the wearable 100 may utilizeresources available on the companion device 120 and/or the servers 206to provide digital assistance experience for the user. For example, inone implementation, the backpack 101 may connect to the Internet usingthe companion device's cellular network connection in order to use theartificial intelligence engine 234 available on the server 206. Inanother implementation, backpack 101 may use a GPS unit of the companiondevice 120 to determine the location of the user 102 rather than havingits own built-in GPS. In another implementation, the backpack 101 mayuse a speaker on the companion device 120 to output auditory feedback tothe user 102 instead of having or using its own built-in speaker 114. Inanother implementation, the backpack 101 may use an audio recognitionmodule available on the companion device 120 to identify songs. Inanother implementation, the backpack 101 may use the processingresources and/or the storage resources of the companion device 120 togenerate video recordings of the environment. In another implementation,the backpack 101 may use a display screen of the companion device 120 toshow the user 102 what the camera 110 is seeing. In otherimplementations, the user's calendar may be stored in the companiondevice 120 or stored in the server 206. It should be apparent thatnumerous permutations of configurations are possible, where any of theinput components, any of the output components, any of the modules ofthe artificial intelligence engine 234, any of the processing resources,any of the storage resources, any of the power resources, and/or any ofthe data sources can be located in the backpack 101, in the companiondevice 120, and/or in the servers 206.

In one implementation, the backpack 101 may identify and/or authenticatethe user 102 based on the companion device 120 that is connected to thebackpack 101. For instance, the companion device 120 may include adevice identification or a user identification that is associated withthe user 102 and is communicated to the backpack 101. Accordingly, if adifferent user with her own companion device wears the backpack 101, thebackpack 101 can determine that a different user is now wearing thebackpack 101 based on the identification of the different companiondevice.

FIGS. 3A and 3B illustrate an example scenario, consistent with thepresent concepts. Several example scenarios will be explained toillustrate the functionalities, operations, versatility, and benefits ofthe present concepts. In FIG. 3A, the user 102 may be carrying thebackpack 101. The user 102 may be in an environment 300 of a ski resort.That is, the environment 300 may include ski slopes 302, ski lifts 304,ski lift poles 306, mountains 308, trees 310, etc. The user 102 may bewearing winter apparel including a jacket 312, gloves 314, boots 316,etc. In this example scenario, the user 102 may be on skis 318 andholding ski poles 320 with his hands.

Suppose that the user 102 in this example scenario is unsure which wayto ski in order to stay in bounds. Conventionally, the user 102 may needto stop skiing, release the ski poles 320 from his hands, take off thegloves 314 from his hands in the frigid weather, reach into his pocketor backpack to pull out his smartphone using his shaking bare hands, andmanually use a map app or a conventional digital assistant to determinewhich way he should go to stay in bounds. Such conventional actionsrequire the user 102 to stop his current activity (i.e., skiing), takehis eyes off his surroundings, and focus his eyes and attention at asmartphone display screen. Even using a conventional digital assistantwith voice command capabilities, the user 102 would need to explicitlyconvey information about the environment 300, such as his location, thename of the ski resort, and/or the cardinal direction (i.e., north,south, east, and west) he is currently facing. These conventionalactions may be difficult for the user 102 in the environment 300 toperform.

On the contrary, consistent with the present concepts, the user 102 mayconveniently ask the backpack 101, “Can I ski this direction?” In thisexample, the microphone 112 on the backpack 101 may record the user'svoice command and send the audio recording to the speech recognitionmodule. The speech recognition module may then interpret the audiorecording of the voice command into text. The cognitive module mayinterpret the text transcript of the voice command and recognize thatthe pronoun “this” is a contextual signal, and therefore, the voicecommand provided by the user 102 is a contextual voice command thatreferences the environment 300 surrounding the user 102. Accordingly,backpack 101 may attempt to perceive the environment 300 in one or moreways. For example, the camera 110 on the backpack 101 may be activatedto record the environment 300. Where the camera 110 faces the front ofthe backpack 101, the camera 110 may be pointing in the same directionthat the user 102 is facing. In this example scenario, the camera 110may capture an image recording of the environment 300 including, forexample, the ski slopes 302, the ski lifts, 304, the ski lift poles 306,the mountains 308, the trees 310, etc. The backpack 101 may use thecompass to determine the cardinal direction that the user 102 is facing.The backpack 101 may use GPS to determine the geographical location ofthe user 102. By sensing the environment 300 in one or multiple ways,the backpack 101 may interpret and understand that the pronoun “this” inthe voice command provided by the user 102 is referring to a specificcardinal direction (e.g., west) from a specific geographical locationwhere the user 102 is standing. Accordingly, the backpack 101 maydetermine that the user 102 is at a specific ski resort, may obtain theslopes map for that specific ski resort by accessing a ski resort mapdatabase, determine which direction the user 102 should ski to stay inbounds, and answer the user's question by formulating an appropriateresponse. For example, as shown in FIG. 3B, the backpack 101 may use thespeaker 114 to produce an auditory response “No. That direction is outof bounds. Ski to your right to stay in bounds.”

Therefore, consistent with the present concepts, the user 102 is able tointeract with the backpack 101 to obtain information about theenvironment 300 by simply speaking a contextual voice command. The user102 need not stop skiing, need not let go of the ski poles 320, need nottake off the gloves 314, need not take his eyes off the slopes(including the trees 310 that he needs to avoid), and need not take hisattention away from his current activity. Furthermore, the backpack 101is context-aware (i.e., can sense and perceive the environment 300) andthus is capable of understanding contextual commands that refer to theenvironment 300. Thus, the user 102 and the backpack 101 are able tospeak to each other naturally using contextual language, such as “thisdirection” and “that direction,” because both the user 102 and thebackpack 101 can perceive the same environment together.

FIG. 4 illustrates another example scenario, consistent with the presentconcepts. In FIG. 4 , the user 102 may be carrying the backpack 101 andstanding in front of a poster 400. The poster 400 may be advertising aBeatles concert at the Candlestick Park on Monday, August 29 at 8:00 pm,which costs $5.00 per ticket. The user 102 may wish to add anappointment in his calendar for this event, because he plans to attendthe concert.

Conventionally, a user may typically have to free his hands (e.g., takeoff his gloves), reach into his pocket or backpack to pull out hissmartphone, launch a calendar application, create a new appointment,manually type in the details of the event (e.g., the title of the event,the venue, date and time, and any notes he may wish to add, such as theentry fee), and save the new appointment for the event. Suchconventional practices may be so tedious, cumbersome, and time-consumingthat users may be deterred from using the calendar and just foregocreating the appointment altogether. Even if a busy user in a hurry canpull out his smartphone and take a photo of the poster 400 so that hecan create an appointment in his calendar later by referring back to thephoto, the user would still have to manually enter the details of theappointment. Moreover, even if the user's smartphone is equipped with aconventional voice-activated digital assistant, the user would have toformulate an overly explicit command and provide complete detailsrelating to the command without error. For example, the user would haveto say, “Hey Assistant, create a calendar appointment titled Beatlesconcert on August 29 at 8 pm at the Candlestick Park and add a note thatthe ticket costs $5.” If the user makes any misstatement or pauses toolong in the middle of this long-winded voice command, then the userwould have to cancel, erase, and start over.

However, consistent with the present concepts, the user 102 can simplytell the backpack 101, “Hey Assistant, add this to my calendar,” and thebackpack 101 will add an appointment to the user's calendar for theevent shown in the poster 400. For example, the microphone 112 on thebackpack 101 may record the user's voice command and send the audiorecording to the speech recognition module. The speech recognitionmodule may then interpret the audio recording of the voice command intotext. The cognitive module may interpret the text transcript of thevoice command and recognize that the pronoun “this” is a contextualsignal, and therefore, the voice command provided by the user 102 is acontextual voice command that references the environment surrounding theuser 102. Accordingly, the camera 110 on the backpack 101 may beactivated to record the environment. Where the camera 110 faces thefront of the backpack 101, the camera 110 may be pointing at the sameobject in the environment that the user 102 is looking at. In thisexample scenario, the camera 110 may capture an image recording of theposter 400 and send it to the computer vision module. Then, the textrecognition module may recognize the textual contents of the poster 400(i.e., the name of the band, venue, date, time, and ticket price).Furthermore, the cognitive module may interpret the voice command andunderstand that the user 102 wishes to create a calendar event based onthe contents of the poster 400. In response, the cognitive module maydetermine that a contextual task of adding an event to the user'scalendar based on the poster 400 should be performed in response to thecontextual voice command provided by the user 102. Accordingly, thedigital assistant module may access the user's calendar and create acorresponding event in the user's calendar. Consistent with the presentconcepts, the user 102 can conveniently and quickly provide contextualvoice commands to the backpack 101 without needing to divert his focusand attention away from the environment to a smartphone or anotherdigital assistant device. Furthermore, the user 102 can provide a shortnatural voice command that references the environment in a normal way,as though he were talking to another person standing next to him, notlike talking to a conventional digital assistant. Importantly, thebackpack 101 can understand the contextual voice command that uses anindirect reference (i.e., the pronoun “this”) and perform the correctcontextual task relating to the environment.

Accordingly, the backpack 101 allows the user 102 to quickly,conveniently, and naturally provide contextual commands (includingqueries) to the backpack 101 that requires the backpack 101 to perceive,sense, and/or understand the context of the user's command in referenceto the environment, surroundings, or an object near the user 102.Whereas conventional digital assistants may be incapable ofunderstanding, processing, or handling contextual commands, the wearable100 consistent with the present concepts can use one or more sensors andartificial intelligence to perceive the environment and deciphercontextual commands.

Many variations in configurations, implementations, and/or uses arepossible with the present concepts. For example, the user 102 standingin front of the poster 400 may press the button 108 before speaking thevoice command. Alternatively, the backpack 101 may always be inlistening mode. The backpack 101 may be configured to recognize that thevoice command feature is being activated by listening for a keyword or ahotword (i.e., a specific word or phrase, such as “Hey Assistant,” “HiBackpack,” “computer,” or any other name, greeting, and/or phrase).Furthermore, the camera 110 may always be active or be configured toautomatically activate whenever the user 102 provides a voice command.The recording of the environment may be captured before, during, orafter the user 102 provides the contextual voice command. Depending onthe complexity of the environmental context and/or the sophistication ofthe artificial intelligence engine 234, the user may adjust the level ofdetail in the contextual voice command. For example, the user 102 mayspecify, “Hey Backpack, add this poster to my calendar.” Or if there aretwo posters aligned vertically, the user 102 may specify, “HeyAssistant, add the bottom poster to my calendar.” Or if the user 102 isstanding in front of a wall full of multiple posters, the user 102 mayspecify, “Hey Assistant, add the Beatles concert to my calendar.”Furthermore, as mentioned above, the speech recognition module, thecognitive module, the text recognition module, and/or the digitalassistant module may be located in the backpack 101, the companiondevice 120, and/or the server 206.

Many other contextual voice commands may be provided by the user 102 tocause corresponding contextual tasks to be performed by the backpack101. For example, the user 102 in FIG. 2 may tell the backpack 101, “HeyAssistant, add a task to buy a ticket for this event.” The cognitivemodule may interpret the contextual signal “this event” in the contextvoice command to refer to the Beatles concert by using the camera 110and the computer vision module to perceive and interpret the poster 400in the environment of the user 102. Furthermore, the cognitive modulemay understand that the user wishes to add a new task to his task listbased on interpreting the voice command, and cause the digital assistantmodule to modify the user's task list by adding a new task accordingly.As another example, the user 102 in FIG. 2 may ask the backpack 101,“Who is this band?” In response, the artificial intelligence engine 234may use the camera 110 to capture an image recording of the poster 400,the cognitive module may understand that the user is asking informationabout the band named Beatles, and the digital assistant may access oneor more data sources to provide information about the Beatles to theuser 102 by outputting an answer through the speaker 114: “The Beatleswere a British rock music band popular in the 1960's. The band consistedof four men named John Lennon, Paul McCartney, George Harrison, andRingo Starr. . . .” In this example, the backpack 101 may be able toprovide information to the user 102 about the surrounding environment.

The backpack 101 may generate many different types of outputs to theuser 102 depending on the voice command provided by the user 102, thetask performed by the backpack 101, settings, preferences, scenarios,and/or situations. For example, the speaker 114 may output a shuttersound when the camera 110 is capturing an image of the environment. TheLED display 116 may light up when the backpack 101 is listening. Thehaptic components 118 may vibrate in response to the user 102 saying“Hey Assistant” to let the user 102 know that the backpack 101 is readyto listen to the user's voice command. The speaker 114 may output asound and/or the haptic component 118 may vibrate as feedback to theuser 102 that a contextual task (e.g., adding an event to the calendaror adding a task to the task list) has been performed. The speaker 114may generate a verbal output, such as, a confirmation that a task hasbeen performed (e.g., “Got it. I added the Beatles concert to yourcalendar.”), an answer to the user's query (e.g., “The weather today issunny.”), or ask the user 102 a question (e.g., “When do you want me toremind you about the Beatles concert starting on Monday, August 29 at 8pm?”).

Additional example scenarios will be provided to demonstrate theversatility in the functionality and usability of the wearable 100,consistent with the present concepts. For instance, the user 102 wearingthe backpack 101 may be walking inside a store (e.g., in a mall) andpicks up a merchandise item (e.g., an apple in a grocery store, a hammerin a hardware store, a hat in an apparel store, etc.) and speaks, “HeyBackpack, check this off my shopping list.” In response, the backpack101 may use the camera 110 to capture an image of the merchandise itemin front of the camera 110 (whether the user 102 is holding themerchandise item in his hand in front of the camera 110 or themerchandise item is sitting on a shelf in front of the camera 110), usethe image recognition module to identify the voice command's referenceto the merchandise item in the user's environment captured in the image,and use the digital assistant module to delete the merchandise item fromthe user's shopping list. (The act of identifying the voice command'sreference to the merchandise item by the image recognition module willbe explained in detail below in connection with FIG. 5 .) The backpack101 may use the speaker 114 to output a response, such as “Sure thing. Ahammer has been checked off from your shopping list,” “There is no applein your shopping list,” “Are you sure you want to remove the hat fromyour shopping list?” “I'll delete blue pens from your shopping list thenext time I'm connected to the Internet.”

The user 102 may also perform a price check on the merchandise item byasking the backpack 101, “Hey Assistant, how much is this?” The backpack101 may use the GPS unit to determine that the user 102 is inside aparticular store (e.g., Nordstrom, Target, Home Depot, Best Buy, etc.),use the digital assistant module to access the prices database of theappropriate store, and provide the user 102 with the requestedinformation through the speaker 114: “That hat is $15.99.” The backpack101 may instruct the user: “Please hold the bar code in front of thecamera.” The user 102 may also perform a price comparison: “How much isit at JC Penney?” In response, the backpack 101 may check the price ofthe merchandise item at JC Penney and respond with: “That hat is $14.99at JC Penney.”

As another example, the user 102 may be wearing the backpack 101 andstanding in front of a menu posted on a wall in front of a restaurant.The user 102 may ask the backpack 101, “How good is this place?” Thebackpack 101 may detect the contextual signal “this place” and interpretthe context of the voice command as referring to a particular location.The backpack 101 may identify the referenced location as a particularrestaurant by using the GPS unit to determine the location of the user102 and/or using the camera to recognize an image of the restaurant nameon the menu or on the restaurant storefront sign. Then, the backpack 101may access a data source of restaurant ratings and reviews and informthe user 102, “This restaurant has a 4.5-star rating from 2,190 reviewsat RateRestaurants.com.” Furthermore, the user 102 may ask the backpack101, “What's popular here?” In response, the backpack 101 may access adatabase of past orders or past reviews of specific dishes at thisparticular restaurant from one or more data sources, and answer theuser's query with “The most popular entrees at this restaurant are thechicken alfredo and the meat lasagna, and the most popular dessert isthe chocolate mousse cake.” The user 102 may also ask the backpack 101,“What do you think I will like here?” In response, the backpack 101 mayaccess the user's food preferences and/or cost preferences (or determinethe user's food preferences and/or cost preferences based on the user'spast orders) and recommend, “I think you will like the shrimp scampi for$12.99.”

As another example, the user 102 may be wearing the backpack 101 in acity and say, “walk me home,” to the backpack 101. In response, thebackpack 101 may utilize the orientation of the user 102 (i.e., whichway the user 102 is facing), use GPS, and access a map database and/or anavigation service to direct the user 102 to his home. For example, thebackpack 101 may output auditory instructions via the speaker 114, suchas “Okay, I'll take you home. Start by going straight,” “turn right onPacific Parkway,” “turn left in 100 feet,” or “use the crosswalk tocross Atlantic Avenue.” Additionally or alternatively, the backpack 101may activate the haptic components 118 on the left side and on the rightside of the backpack 101 to signal to the user 102 that he should turnleft or right, respectively. As another example, the user 102 wearingthe backpack 101 may step off a bus or walk out of a train station andask the backpack 101, “Which way to the library?” The backpack 101 canuse the compass for the user's orientation, the GPS unit for the user'slocation, and/or accessible map data to guide and steer the user 102 byoutputting through the speaker 114, “Go left and head south” or byoutputting haptic vibrations on the left side of the user's body.

In some implementations, the camera 110, the microphone 112, the GPSunit, the compass, and/or the biometric sensor may remain activated andrecording to serve a black box for security purposes in case there is anaccident. For example, the camera 110 and the microphone 112 can act asa security body cam. The GPS unit, compass, and biometric sensor canrecord the user's location, direction, and biometric readings.

In addition to giving contextual commands that refer to the user'scurrent location, the user 102 may also give contextual commands thatrefer to objects in the surroundings. For example, the user 102 in thecity may say “guide me over that bridge” or the user 102 hiking innature may say “direct me over that hill” or “how do I get to the otherside of this river.” In response, the backpack 101 may use the camera110 to identify the appropriate contextual reference (e.g., the bridge,the hill, or the river in the image captured by the camera 110) in theuser's voice command and perform the appropriate contextual task ofnavigating the user 102 to where he wants to go.

Furthermore, the present concepts can greatly benefit people withdisabilities. For example, the backpack 101 may assist the visuallyimpaired population (e.g., a blind person walking on foot). The backpack101 may help the blind person navigate busy city streets by generatinghaptic and/or auditory feedback to instruct the blind person when andwhere to turn. The backpack 101 can see the environment around the blindperson (including vehicles, traffic lights, crosswalk markings, otherpedestrians, etc.) in real-time using the camera 110 and can safelyguide the blind person. The backpack 101 may also assist thehard-of-hearing population (e.g., a deaf person). The backpack 101 maylisten to sounds in the deaf person's environment and providenon-auditory (e.g., visual and/or haptic) feedback to the deaf person.For example, if a car honks at the deaf person while walking in thecity, the backpack 101 may provide a strong haptic response to the deafperson's body in the same direction that the honk sound came from.

Furthermore, the user 102 may want to hail a cab ride (e.g., a taxiservice or a rideshare service) after the user 102 has finished hikingor when the user 102 is walking in the city. The user 102 may tell thebackpack 101, “Hey Backpack, order a car ride home.” The backpack 101may use the digital assistant module to access a car ride service, toorder a ride for the user 102, and to respond with, “No problem. A redToyota Corolla will be here in 7 minutes to pick you up.”

As another example, the user 102 wearing the backpack 101 may be hikingin a foreign country, sees a foreign language sign posted next to alake, and asks the backpack 101, “Hey Assistant, what does this say?” Inresponse, the backpack 101 may detect the contextual reference in thequery from the user 102, use the camera 110 to determine the context ofthe query, access a language translation service, and inform the user102 through the speaker 114, “That sign says ‘toxic waste dump, do notenter the water.’” Moreover, the backpack 101 can act as a personaltranslator while the user 102 is traveling in a foreign country. Forexample, the user 102 can tell the backpack 101, “Hey Backpack, tellthis ticketing agent that I want to buy one ticket for the train toParis that leaves in 15 minutes.” In response, the backpack 101 can usethe speaker 114 to tell the ticketing agent the translated message inthe appropriate language based on the location of the user 102 asdetermined by GPS and/or based on the spoken language detected in theambient sounds picked up by the microphone 112.

As another example, the user 102 wearing the backpack 101 may bestanding in front of a movie poster or a billboard sign and asks, “Whois that?” In response, the backpack 101 may use the camera 110 todetermine the context of the user's query by capturing an image of aperson in the movie poster or the billboard sign, and interpret theuser's query as asking for an identification of the person. The backpack101 may use the facial recognition module to determine the identity ofthe person captured in the image and produce an output to the user 102:“That person is Kim Kardashian.”

As another example, the user 102 wearing the backpack 101 may walk intoa venue (e.g., a restaurant, a bar, a nightclub, an outdoor concert,etc.) and hear a song that the user 102 likes. The user 102 may ask thebackpack 101, “Hey Assistant, what song is this?” “Who sings this?” or“Hey Assistant, download this song to my phone.” In response, thebackpack 101 may use the cognitive module to understand that the voicecommand from the user 102 is a contextual voice command that refers to asong playing in the environment of the user 102. Accordingly, thebackpack 101 may record the song playing in the environment using themicrophone 112 after the receipt of the voice command and/or use arecording of the environment including the song that was made before theuser 102 provided the voice command. The backpack 101 may access adatabase of songs to identify the song playing in the user'ssurroundings, and perform the appropriate contextual task based on theuser's contextual voice command. For example, the backpack 101 mayaudibly inform the user 102 using the speaker 114 the identification ofthe song (e.g., “That song is Thriller by Michael Jackson.”) or downloada copy of the song to the user's phone (e.g., “The song has beendownloaded to your phone.”). The user 102 may provide contextual voicecommands that relate to any sound in the environment to the backpack101, such as animal sounds, instrument sounds, etc.

As another example, the user 102 wearing the backpack 101 may be hiking,biking, skiing, or snowboarding, etc. The user 102 may ask the backpack101, “How much higher is that hill?” “How tall is that mountain?” “Howlong is that trail?” or “How steep is that slope?” In response, thebackpack 101 may interpret the contextual voice command based on animage of the environment taken by the camera 110, the orientation of theuser's body and/or head, the location of the user 102 determined by theGPS unit, and/or a map of the user's surroundings from a maps datasource. The backpack 101 may determine the answer to the user's querybased on the image of the user's environment (including an image captureof the hill, mountain, trail, or the slope), the location (including theelevation) of the user, a topographic map, and/or a trails map. Thebackpack 101 may output a response to the user's query using the speaker114, such as “That hill is 350 feet higher than your elevation,” “Thatmountain is 2,680 feet above sea level,” “That trail is 2.8 miles long,”or “That slope is 30% gradient.” Accordingly, the user 102 can easilyobtain information about his environment from the backpack 101 usingcontextual voice commands.

It should be evident from the example scenarios described above,including the example contextual commands and the example contextualtasks, that the artificial intelligence assisted wearable 100 that iscapable of perceiving the environment and understanding contextualcommands, consistent with the present concepts, can have a wide range ofapplications, functionalities, and utility.

FIG. 5 illustrates an example image 500 of an environment havingobjects, consistent with the present concepts. Using one or moresensors, the artificial intelligence engine 234 may detect, recognize,and/or identify objects in the environment surrounding the user 102. Forexample, the image 500 shown in FIG. 5 may include the environmentcaptured by the camera 110 as the user 102 wearing the backpack 101 isstanding in a grocery store aisle and holding up bananas 502 on his hand504. The user 102 may provide a contextual voice command to the backpack101, such as “Remove this from my shopping list,” “How much is this?”“Is this on sale?” “What did this cost last week?” “Do I need this?” or“How much is this at the next closest grocery store?” In response, theartificial intelligence engine 234 may recognize that the contextualvoice command includes a contextual signal and therefore may determinethe proper context of the user's voice command.

In this example, the image 500 may be processed by the computer visionmodule (for example, the image recognition module) to determine thecontext of the voice command. For instance, the image recognition modulemay include a deep learning neural network that has been trained torecognize and identify various objects, including grocery storemerchandise objects. The neural network may be configured to recognizethe objects in the image (e.g., the hand 504, the bananas 502, papertowels 506, canned tomatoes 508, soda bottles 510, shelves 512, etc.) aswell as to calculate confidence values and/or prominence valuesassociated with the identified objects. The confidence values mayindicate how confident the image recognition module is about theidentification of the associated object. The prominence values mayindicate the prominence of the associated objects in the image 500 basedon a plurality of factors, such as the size of the object relative tothe image size, the size of the object relative to the sizes of otherobjects in the image 500, the location of the object in the image 500(i.e., near the center versus on the outskirts), the focus of the object(i.e., crisp versus blurry), the number of similar objects in the image500, whether any portion of the object is obscured or blocked, thebrightness of the object compared to the brightness of the image 500 orother objects in the image 500, etc.

Based on the image recognition module's identification of the objects inthe image 500 and their rankings (e.g., based on their prominencevalues), the artificial intelligence engine 234 can determine the propercontext of the user's contextual voice command. In this examplescenario, the image recognition module may return a list of identifiedobjects that includes the bananas 502 and a list of associatedprominence values that includes the highest prominence value for thebananas 502. Accordingly, the artificial intelligence engine 234 maydetermine that the user's contextual voice command may be referring tothe bananas 502 (as opposed to the paper towels 506, the canned tomatoes508, the soda bottles 510, or the shelves 512). Therefore, the digitalassistant module may, for example, remove bananas from the user'sshopping list, inform the user 102 of the price of the bananas 502,inform the user 102 whether the bananas 502 are on sale, inform the user102 of the price of bananas last week, inform the user 102 whetherbananas are currently in his shopping list, or inform the user 102 ofthe price of bananas at the next closest grocery store, depending on thecontextual voice command provided by the user 102. Furthermore, thedigital assistant module may generate an auditory output that explicitlyand specifically identifies the object in the environment that wasindirectly referenced in the contextual voice command (i.e., “bananas”in this example) to provide the user 102 a confirmation that thebackpack 101 perceived the environment and interpreted the contextualvoice command correctly.

In another implementation, the artificial intelligence engine 234 mayuse the contextual signals in the voice command to further process thelist of identified objects and their associated prominence valuesreturned by the image recognition module. For instance, had the user 102said, “Remove this fruit from my shopping list,” the artificialintelligence engine 234 may register only the identified objects thatfall into the fruit category even if there are non-fruit objects havinghigher prominence values identified in the image 500. Similarly, if theimage 500 had included yellow bananas held up by the user's left handand also included green bananas held up by the user's right hand, andthe user 102 said “How much are the green ones?” the artificialintelligence engine 234 may consider only green-colored objectsidentified in the image 500 by the image recognition module.

The above description is an example of how the artificial intelligenceengine 234 can use the image recognition module to determine therelevant context associated with the voice command provided by the user102. The artificial intelligence engine 234 consistent with the presentconcepts may use other modules to determine the relevant contextassociated with the voice command by sensing the environment surroundingthe user 102. For example, text in the user's environment, such as inbooks, menus, signs, posters, billboards, storefronts, facades, labels,etc., may be recognized by the text recognition module. The textrecognition module may use, for example, optical character recognition(OCR) techniques, including accounting for distortions, such as skews,blurs, smudges, warps, wraps, gaps, turns, twists, etc. The textrecognition module may also use machine learning techniques and betrained to recognize dates, proper nouns (i.e., names), certainlanguages, etc.

Furthermore, faces in the user's environment (such as, faces ofpedestrians, faces of drivers, faces on posters, faces on billboards,faces on televisions, faces of pets, etc.) may be recognized by thefacial recognition module. The recognized faces may be identified byreferencing databases of known faces, such as address books with profileimages for the contacts, social media networks, and photo galleries withidentity tags. Accordingly, for example, the user 102 may be wearing thebackpack 101 and standing in front of his friend Matt who just offeredhim a ride to the airport tomorrow. The user 102 can tell the backpack101, “Hey Backpack, can you email Matt my flight itinerary?” Whereas aconventional digital assistant would have to ask the user 102 to specifywhich one of the dozen contacts in the user's address book with the nameMatt, the backpack 101, consistent with the present concepts, can usethe camera 110 and the facial recognition module to ascertain which Mattthe user 102 is referring to in the contextual voice command. Indeed,the backpack 101 would have been able to correctly interpret thecontextual voice command even if the user 102 had said “email my friend”or just “email him,” because the backpack 101 can see what the user 102sees, consistent with the present concepts. As another example, the user102 may have met an associate at a business conference. The user 102 cansimply say, “Hey Backpack, link this person in my professional socialmedia network.” In response, the backpack 101 may capture an imagerecording of the associate's face using the camera 110, search and findthe associate's profile in the professional social media network, andconnect the associate's profile to the user's account.

Similarly, sounds in the user's environment sensed by the microphone 112may be interpreted by the audio recognition module. Where there aremultiple simultaneous sounds, the audio recognition module may employone or more filtering techniques, consider the relative volumes of thedifferent sounds, and/or consider the contextual signals in the user'svoice command (e.g., “What animal sound is that?” “Who sings this song?”etc.). The audio recognition module may be configured to recognizevarious types of sounds based on a set of rules and/or trained torecognize various types of sounds using training data of known sounds.Where the backpack 101 includes multiple microphones 112 positioned topick up sounds emanating from different directions, the audiorecognition module may be configured to weigh sounds according to theirdirection. For example, sounds coming from the front of the user 102 maybe given more weight compared to sounds coming from behind the user 102.

The artificial intelligence engine 234 may use the speech recognitionmodule to recognize speech from the audio recording made of the soundsin the user's environment by the microphone 112. The speech recognitionmodule may use rules-based techniques (e.g., grammatical rules) and/ormachine learning-based techniques. The speech recognition module may bepersonally tailored or trained to better recognize the speech of theuser 102 or more generally configured to better recognize the localdialect and/or language associated with the user 102 or the user'sgeographical location.

The artificial intelligence engine 234 may use the cognitive module tointerpret and understand the contextual voice command provided by theuser 102. In some implementations, the cognitive module may includenatural language processing capabilities as well as human-computerinteraction reasoning functionalities. Accordingly, the cognitive modulecan detect any contextual signals that refer to the user's environmentand interpret contextual voice commands. The cognitive module may useother modules (e.g., the computer vision module or audio recognitionmodule) to perceive the environment. The cognitive module can alsodecide what computerized task should be performed in response to thevoice command. In one implementation, the cognitive module can use thedigital assistant module to carry out the computerized task. Thecognitive module may be implemented based on defined rules and/ormachine learning. Furthermore, the cognitive module may be continuouslymodified, for example, continuously learning based the behavior of theuser 102.

The digital assistant module may be capable of performing a myriad ofcomputerized tasks, such as, executing commands, answering queries,searching for information, saving data, prompting users, etc. Thedigital assistant module may have access to various data sources,including the Internet, search engines, databases, social networks, theuser's calendar, the user's address book, the user's task list, theuser's shopping list, the user's accounts at various web sites, etc. Theuser 102 may personalize the digital assistant module or change relevantsettings, such as selecting the voice, language, and/or dialect of thedigital assistant module; adjust the level of prompts, etc.

FIG. 6 shows a flowchart illustrating an example wearable method 600,consistent with the present concepts. For example, the wearable method600 may be performed by a wearable, such as the backpack 101 or anyother wearable, consistent with the present concepts.

In act 602, a command may be received. The command may include a requestto perform a particular computerized task. The command may include aquery seeking certain information. The command may be a contextualcommand that relates to an environment surrounding a user. That is, thecommand may include a contextual signal, such as a reference or a cue(e.g., a pronoun) relating to something about the environment. Thecommand may have been provided by a user wearing the wearable andtalking into a microphone on the wearable, or provided by the user viaany other means.

In act 604, the environment may be sensed. For instance, a recording ofthe environment may be captured using one or more sensors. For example,the environment may be visually sensed by capturing a video recordingusing a camera, auditorily sensed by capturing an audio recording usinga microphone, or sensed in any other way possible based on thecapabilities of the available sensors. Accordingly, information aboutthe environment may be included in a recording.

In act 606, the command and the sensed environment information may betransmitted to an artificial intelligence engine. For example, arecording of the command (e.g., an audio recording or a textualrecording) as well as a recording of the environment (e.g., an imagerecording or a video recording) may be transmitted to the artificialintelligence engine. Where the artificial intelligence engine residesremotely from the wearable, the transmission may occur through one ormore networks.

In act 608, a response to the command may be received. For instance, theresponse may be received from the artificial intelligence engine as aresult of the command provided to it. Where the artificial intelligenceengine is remote from the wearable, the response may be received throughone or more networks. The response may be an answer to the query, anacknowledgment that the request has been executed, and/or a prompt tothe user seeking clarification or more information. The response mayinclude visual feedback, auditory feedback, and/or haptic feedback.

In act 610, the response may be output to the user. The response may beoutput using one or more output components of the wearable, or theresponse may be output using one or more output components that areaccessible by the wearable.

FIG. 7 shows a flowchart illustrating an example server method 700,consistent with the present concepts. For example, the server method 700may be performed by one or more server devices, such as the server 206that include the artificial intelligence engine 234 or any other device,consistent with the present concepts.

In act 702, a command and information about the environment may bereceived. For example, the command received in act 702 may be thecommand transmitted in act 606, and the environment information receivedin act 702 may be the environment information transmitted in act 606.Where the wearable and/or the sensors that received the command and/orthe environment information reside remotely, the command and theenvironment information may be received through one or more networks.The command may include a request to perform a particular computerizedtask. The command may include a query seeking certain information. Thecommand may be a contextual command that relates to an environmentsurrounding a user. That is, the command may include a contextualsignal, such as a reference or a cue, relating to something about theenvironment. The command may have been provided by a user wearing thewearable and talking into a microphone on the wearable, or provide bythe user via any other means. The command may be received in the form ofan audio recording or a textual recording. Furthermore, the environmentinformation may be received in the form of a recording captured usingone or more sensors. For example, the environment may be visually sensedby capturing a video recording using a camera, auditorily sensed bycapturing an audio recording using a microphone, or sensed in any otherway possible based on the capabilities of the available sensors.

In act 704, the command may be interpreted. For example, an audiorecording of the command may be converted into text using speechrecognition. The text of the command may be further interpreted using acognitive service to determine the meaning of the command, e.g., arequest for a computerized action, a query seeking information, etc.Furthermore, consistent with the present concepts, a contextual signalin the command may be detected, where the contextual signal referencesthe environment and/or the context associated with the command.

In act 706, the environment may be recognized. For example, computervision may be used to recognize the environment from a visual recordingof the environment. For instance, image recognition can be used torecognize objects in the environment, facial recognition may be used torecognize faces in the environment, and/or text recognition may be usedto recognize text in the environment. Similarly, audio recognition maybe used to recognize sounds in the environment.

In act 708, a computerized task may be performed. For instance, acognitive service may determine a computerized task that should beperformed based on the interpretation of the command and/or therecognition of the environment. Further, a digital assistant may performthe determined computerized task. The computerized task may be acontextual task that relates to the environment. As explained above, thedigital assistant may be capable of performing a wide range of tasks, solong as the digital assistant has access to information and/or datanecessary to perform the tasks.

In act 710, a response to the command may be transmitted. For example,the digital assistant may generate a response based on the commandand/or the computerized task. The response may be transmitted to beoutput to the user who provided the command. The response transmitted inact 710 may be the response received in act 608. The response may be ananswer to a query, an acknowledgment that a request has been executed,and/or a prompt to the user seeking clarification or more information.The response may include visual feedback, auditory feedback, and/orhaptic feedback. Where the wearable is located remotely, the responsemay be transmitted through one or more networks.

The methods described above (including the wearable method 600 and theserver method 700) and the acts thereof can be performed by any system,device, and/or component described above, and/or by any other system,device, and/or components capable of performing the described methods oracts. The methods can be implemented in any suitable hardware, software,firmware, or combination thereof. For example, the methods may be storedon one or more computer-readable storage media as a set of instructions(e.g., computer-readable instructions or computer-executableinstructions) such that execution by a processor of a computing devicecauses the computing device to perform the method. The order in whichthe methods and acts are described is not intended to be construed as alimitation, and any of the described methods and/or acts can be combinedin any order to implement the methods and/or act, or alternate methodsand/or acts.

Various examples are described above. Although the subject matter hasbeen described in language specific to structural features and/ormethodological acts, the subject matter defined in the appended claimsis not necessarily limited to the specific features or acts describedabove. Rather, the specific features and acts described above arepresented as example forms of implementing the claims, and otherfeatures and acts that would be recognized by one skilled in the art areintended to be within the scope of the claims.

Additional examples are described below. One example includes a backpackcomprising a strap including a camera that faces a front direction of auser when the user wears the backpack, a microphone, a speaker, anetwork interface, and a processor. The backpack also comprises astorage having instructions which, when executed by the processor, causethe processor to: receive a contextual voice command from the user viathe microphone, the contextual voice command using a non-explicitreference to an object in an environment, capture an image of theenvironment including the object via the camera, transmit the contextualvoice command and the image to an artificial intelligence engine via thenetwork interface to cause a contextual task to be performed, thecontextual task including a computerized action relating to the object,receive a response associated with the contextual task that wasperformed based at least on the contextual voice command, and output theresponse to the user via the speaker.

Another example can include any of the above and/or below examples wherethe backpack further comprises a compass, where the instructions furthercause the processor to sense a direction that the user is facing via thecompass.

Another example can include any of the above and/or below examples wherethe backpack further comprises a global positioning system (GPS) unit,where the instructions further cause the processor to determine alocation of the user via the GPS unit.

Another example includes a system comprising a wearable, a sensorattached to the wearable, the sensor being fixed relative to a body of auser and capable of sensing an environment, and a processor. The systemalso comprises a storage having instructions which, when executed by theprocessor, cause the processor to: receive a contextual voice commandthat includes a pronoun to refer to an object in the environment, detectthe object in the environment using the sensor, cause an artificialintelligence engine to perform a contextual task relating to the objectin response to the contextual voice command, and output a responseassociated with the contextual task to the user.

Another example can include any of the above and/or below examples wherethe wearable includes a backpack.

Another example can include any of the above and/or below examples wherethe sensor includes a camera.

Another example can include any of the above and/or below examples wherethe camera is located in a strap of the wearable and facing a frontdirection of the user.

Another example can include any of the above and/or below examples wherethe system further comprises a speaker for outputting the response,wherein the response includes auditory feedback.

Another example can include any of the above and/or below examples wherethe system further comprises a light emitting diode for outputting theresponse, wherein the response includes visual feedback.

Another example can include any of the above and/or below examples wherethe system further comprises a haptic actuator for outputting theresponse, wherein the response includes haptic feedback.

Another example can include any of the above and/or below examples wherethe system further comprises a network interface for connecting to anetwork through a companion device that is capable of connecting to thenetwork.

Another example can include any of the above and/or below examples wherethe system further comprises a battery for charging a companion device.

Another example includes a method comprising receiving a contextualvoice command that references an object in an environment withoutexplicitly identifying the object, capturing a recording of theenvironment including the object, using an artificial intelligenceengine to determine an identification of the object and to interpret thecontextual voice command based at least on the identification of theobject, and causing a contextual task to be performed in response to thecontextual voice command, the contextual task including a computerizedaction relating to the object in the environment.

Another example can include any of the above and/or below examples wherethe method further comprises using a speech recognition module tointerpret the contextual voice command.

Another example can include any of the above and/or below examples wherethe recording includes one or more of: an audio recording, an imagerecording, and/or a video recording.

Another example can include any of the above and/or below examples wherethe method further comprises using an image recognition module todetermine the identification of the object in the recording.

Another example can include any of the above and/or below examples wherethe method further comprises using a text recognition module todetermine the identification of the object in the recording.

Another example can include any of the above and/or below examples wherethe method further comprises using a facial recognition module todetermine the identification of the object in the recording.

Another example can include any of the above and/or below examples wherethe method further comprises using a cognitive module to determine thecontextual task to be performed in response to the contextual voicecommand.

Another example can include any of the above and/or below examples wherethe method further comprises generating a response associated with thecontextual task and transmitting the response to be output to a user.

1-20. (canceled)
 21. A hands-free digital assistant device, comprising:a camera; a microphone; a speaker; a network interface; a processor; anda storage having instructions which, when executed by the processor,cause the processor to: receive a contextual voice command from the uservia the microphone, wherein the contextual voice command uses anon-explicit reference to an object in an environment; capture an imageof the environment including the object via the camera; causetransmission of the contextual voice command and the image to an enginevia the network interface to cause a contextual task to be performed,the contextual task including a computerized action relating to theobject; receive a response associated with the contextual task that wasperformed based at least on the contextual voice command; and output theresponse to the user via the speaker.
 22. The device of claim 21,further comprising: a compass, wherein the instructions further causethe processor to sense a direction that the user is facing via thecompass.
 23. The device of claim 21, further comprising: a globalpositioning system (GPS) unit, wherein the instructions further causethe processor to determine a location of the user via the GPS unit. 24.A system for hands-free digital assistant device, comprising: a wearabledevice; a sensor attached to the wearable device, the sensor being fixedrelative to a body of a user and capable of sensing an environment; aprocessor; and a storage having instructions which, when executed by theprocessor, cause the processor to: receive a contextual voice commandthat includes a pronoun for a contextual signal that refers to an objectin the environment; detect the object in the environment using thesensor; cause an engine to perform a contextual task relating to theobject in response to the contextual voice command; and output aresponse associated with the contextual task to the user.
 25. The systemof claim 24, wherein the wearable device includes a backpack.
 26. Thesystem of claim 24, wherein the sensor includes a camera.
 27. The systemof claim 26, wherein the camera is located in a strap of the wearabledevice and facing a front direction of the user.
 28. The system of claim24, further comprising: a speaker for outputting the response, whereinthe response includes auditory feedback.
 29. The system of claim 24,further comprising: a light emitting diode for outputting the response,wherein the response includes visual feedback.
 30. The system of claim24, further comprising: a haptic actuator for outputting the response,wherein the response includes haptic feedback.
 31. The system of claim24, further comprising: a network interface for connecting to a networkthrough a companion device that is capable of connecting to the network.32. The system of claim 24, further comprising: a battery for charging acompanion device.
 33. A method for hands-free digital assistance,comprising: receiving a contextual voice command that makes a referenceto an object in an environment without explicitly identifying theobject; capturing a recording of the environment including the object;using an engine to determine an identification of the object and tointerpret the contextual voice command based at least on theidentification of the object; and causing a contextual task to beperformed in response to the contextual voice command, the contextualtask including a computerized action relating to the object in theenvironment.
 34. The method of claim 33, further comprising: using aspeech recognition module to interpret the contextual voice command. 35.The method of claim 33, wherein the recording includes one or more of:an audio recording, an image recording, and/or a video recording. 36.The method of claim 33, further comprising: using an image recognitionmodule to determine the identification of the object in the recording.37. The method of claim 33, further comprising: using a text recognitionmodule to determine the identification of the object in the recording.38. The method of claim 33, further comprising: using a facialrecognition module to determine the identification of the object in therecording.
 39. The method of claim 33, further comprising: using acognitive module to determine the contextual task to be performed inresponse to the contextual voice command.
 40. The method of claim 33,further comprising: generating a response associated with the contextualtask; and transmitting the response to be output to a user.